Analysis of Data Quality
The Correlates of War datasets are a product of the Correlate of War Project (COW) founded in 1963. COW’s goal is to “facilitate the collection, dissemination, and use of accurate and reliable quantitative data in international relations.” [Correlates of War Project] From the COW datasets we focused on 4 datasets: National Materials Capabilities, Militarized Interstate Disputes, Alliances and Trade. As these data sets are part of the COW project, the authority, accuracy and objectivity of the data sets are impeccable. All the data sets use a field called country code, a numeric ID given to each country. This is a consistent field that can be used to tie the data across the data sets in the COW project.
National Materials Capabilities
The overall data quality of NMC dataset is very good. There are roughly 14,000 entries and 89% of them did not have any missing values. There is one data entry for each country per year. The accuracy of the data is also very good because as countries are dissolved and new ones are formed, this data keeps track of them. For example, the graph depicts the CINC of Austria-Hungary from 1900-1918, the end of WWI when it the Austro-Hungarian Empire was dissolved. Immediately after that, you see data points for Austria and Hungary separately. This same accuracy hold true for many different countries, where there is only data once the country has declared independence or has just been created.
NMC_test <- NMC_orig
NMC_test$cinc[NMC_test$cinc == -9] <- NA
NMC_test$irst[NMC_test$irst == -9] <- NA
NMC_test$milex[NMC_test$milex == -9] <- NA
NMC_test$milper[NMC_test$milper== -9] <- NA
NMC_test$pec[NMC_test$pec == -9] <- NA
NMC_test$tpop[NMC_test$tpop == -9] <- NA
NMC_test$upop[NMC_test$upop == -9] <- NA
NMC_test <- filter(NMC_test, NMC_test$year %in% c(1900:2007))
test <- c("Austria-Hungary", "Austria", "Hungary")
test_ccode <- member_alliances$ccode[match(test, member_alliances$state_name)]
NMC_test <- filter(NMC_test, NMC_test$ccode %in% test_ccode)
ggplot() +
scale_x_continuous(name="Year") +
scale_y_continuous(name="CINC") +
labs(color ='Country Abbreviation')+
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=.09, fill=Conflict),alpha=0.15) +
geom_line(data = NMC_test, aes(x = year, y = cinc, color = stateabb, group = stateabb)) +
ggtitle("CINC for Austria-Hungary") +
theme_classic()+
scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
scale_color_brewer(palette="Paired")+
theme(plot.title = element_text(hjust = .5), legend.position = "bottom")

Below is heat map of CINC values for a few countries and we can see that there are missing CINC values for certain countries. More specifically, this the heat map below shows the CINC values of countries that were destroyed by both the Holocaust and the fight for freedom from Communism. These countries had to go through a period of major reformation and recovery. In the data set, anytime that a country has been devastated it has missing CINC values.
topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]
NMC_cas <- filter(NMC_orig, NMC_orig$year %in% c(1900:2012))
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}
ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
xlab("Country") +
ylab("Year") +
scale_fill_viridis() +
ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") +
theme_classic()+
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip() +
labs(color ='CINC')

NA
Alliances
There are 6 iterations of the same data, each serving a different purpose. The Alliances contains the alliance dataset distributed by member, with one observation for each alliance member initiation. There is also another file with same data but includes an additional field that has an observation for each year the member is in the alliance. There is another data set with one observation for each dyadic alliance initiation. This data set also list the details of the alliances. As with the alliances data set, the dyadic alliances data set contains another file that has an observation for each year the member is in the alliance. Finally, there is the directed dyad data set that has 2 observations per alliance, where it details the terms of the alliance of country A towards country B, then another point to details the terms of agreement of country B towards country A. As with the other alliances data sets, there is another directed dyad data set with one observation for each year the alliance is in effect. From all the data we have looked at it in our analysis, the data shows information that is in line with historical events.
Some of the inconsistencies that I noticed are in cases where the alliance is still in effect as of the 12/31/2012, which was when this data set was last updated. In some of the datasets, if the alliance is ongoing, it would have the dyad_end_year, the field that represents the year in which the alliance was terminated, set to 2012 and in other data sets it would have it set as ‘NA’. In the case that dyad_end_year is set to 2012, it was hard to know if there were any alliances that ended in 2012 or if they were an ongoing alliance.
Militarized Interstate Disputes (MID)
The MID data is thorough and well collated. As conflicts or wars can have multiple dimensions to them, the MID data is broken down into 2 main data sets. MIDA contains an observation per conflict with details about the start and end of the conflict, the fatalities, the outcomes, the settlements, the highest action taken etc. The other part of the dataset- MIDB, contains one observation per actor in a conflict, thus adding a different dimension to the data.

As seen above, the authors of the dataset have used techniques to categorize certain subjective variables and then futher factor them to the number format. For instance, the “Outcome” of a dispute was factored as follows [Palmer, Glenn]:
Victory for side A: 1
Victory for side B: 2
Yield by side A: 3
Yield by side B: 4
Stalemate: 5
Compromise: 6
Released: 7
Unclear: 8
Joins ongoing war: 9
Missing: -9
That created a conformity among all recorded obervations which made it easier to work with.
Another arrangement that made the data effective was that certain datapoints were clearly marked unclear or missing and based on the context of the graph, we were able to easily sift through them.
Main Analysis
National Materials Capabilities
NMC <- NMC_orig
NMC$cinc[NMC$cinc == -9| is.na(NMC$cinc)] <- 0
NMC$irst[NMC$irst == -9| is.na(NMC$irst)] <- 0
NMC$milex[NMC$milex == -9| is.na(NMC$milex)] <- 0
NMC$milper[NMC$milper== -9| is.na(NMC$milper)] <- 0
NMC$pec[NMC$pec == -9| is.na(NMC$pec)] <- 0
NMC$tpop[NMC$tpop == -9| is.na(NMC$tpop)] <- 0
NMC$upop[NMC$upop == -9| is.na(NMC$upop)] <- 0
all_year <- c(1900:2007)
NMC_ratios <- c("")
for (year_t in all_year){
yr <- filter(NMC, NMC$year %in% year_t)
max <- apply(yr[, c(4:9)], 2, sum)
#max <- as.numeric(max[4:9])
for (i in 4:9){
yr[,i] = as.numeric(yr[,i]/max[i-3])
}
NMC_ratios <- smartbind(NMC_ratios , yr)
}
NMC_ratios <- NMC_ratios [c(2:nrow(NMC_ratios )), c(2:length(NMC_ratios))]
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
National Materials Capability measures the power of a country based on 6 values: total population, urban population, military personnel, military expenditures, iron and steel production and energy consumption. NMC is purely a measure of military and economic means of influence rather than diplomacy or other forms of influence.
CINC is the composite score to measure the power of a country using the average of the ratios of each country value to the total value of all countries for each of the 6 factors. Below is the CNIC score for major powers today who also participated the major wars in the past. Also highlighted are the period wars mentioned above.
countries <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Japan", "China")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]
NMC_mp <- filter(NMC, NMC$year %in% all_year)
NMC_mp <- filter(NMC_mp, NMC_mp$ccode %in% country_code)
NMC_mp$country = ""
for( i in c(1:length(country_code))){
NMC_mp$country[NMC_mp$ccode == country_code[i]] <- countries[i]
}
ggplot() +
scale_x_continuous(name="Year") +
scale_y_continuous(name="CINC") +
labs(color ='Country')+
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$cinc), fill=Conflict),alpha=0.15) +
geom_line(data = NMC_mp, aes(x = year, y = cinc, color = country, group = stateabb)) +
ggtitle("CNIC by Year for Major Powers Today") +
theme_classic()+
scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
scale_color_brewer(palette="Paired")+
theme(plot.title = element_text(hjust = .5), legend.position = "bottom")

NA
As you can see CNIC is constantly changing. Interestingly, the CNIC spikes for the US right after WWI and WWII where the US had a major role in the outcome of the wars. There is also an increase in the US’s CINC score during the Korean War. Following the Korean War, the US’s CINC score shows a continuous decrease during the Vietnam where the US lost the war. At the end of WWI, you see Russia’s CNIC dip low but it bounced back to its CNIC before WWI quickly. During the Vietnam War, Russia supported the Vietnamese people and you see the its CINC score increase above that of the US as they start to gain ground in the war. Also, Russia’s CINC score drops low towards the end of the Cold War as the satellite states start gaining their independence and the USSR was dissolved (Battlefield: Vietnam).
To explore the trends more, the data was refined to only look at the major participants of each war. For example, with WWI we looked at the CINC values for major players in the Allied Powers and Central Powers a few years before and after the war. We replicated this process for all the events listed above. Below are CINC graphs using this approach for WWI and WWII.
allied <- c("United States of America", "United Kingdom", "Russia", "Japan", "Italy")
allied_ccode <- member_alliances$ccode[match(allied, member_alliances$state_name)]
central <- c("Germany", "Turkey", "Austria-Hungary", "Romania", "Bulgaria")
central_ccode <- member_alliances$ccode[match(central, member_alliances$state_name)]
WWI_range = c(1904:1930)
WWI<- filter(NMC, NMC$year %in% WWI_range)
alliedP<- filter(WWI, WWI$ccode %in% allied_ccode)
alliedP$side = "Allied Powers"
for( i in c(1:length(allied_ccode))){
alliedP$country[alliedP$ccode == allied_ccode[i]] <- paste(allied[i],"(Allied)", sep = " ")
}
centralP<- filter(WWI, WWI$ccode %in% central_ccode)
centralP$side = "Central Powers "
for( i in c(1:length(central_ccode))){
centralP$country[centralP$ccode == central_ccode[i]] <- paste(central[i], "(Central)", sep = " ")
}
WWI <-rbind(alliedP, centralP)
ww1 <- ggplot() +
labs(color ='Country')+
xlab("Year") +
ylab("CNIC")+
geom_rect(data=d, mapping=aes(xmin=1914, xmax=1918, ymin=0, ymax=.4),alpha=0.05, fill ="salmon") +
geom_line(data = WWI, aes(x = year, y = cinc, color = country, group = country)) +
facet_wrap(~side) +
theme_classic()+
scale_color_brewer(palette="Paired")+
ggtitle("CNIC Score: WWI Major Players ")+
theme(plot.title = element_text(hjust = .5),legend.position="right")
allies <- c("United States of America", "United Kingdom", "France", "Russia", "Australia","China")
allies_ccode <- member_alliances$ccode[match(allies, member_alliances$state_name)]
axis <- c("Germany", "Italy", "Japan", "Hungary", "Romania", "Bulgaria")
axis_ccode <- member_alliances$ccode[match(axis, member_alliances$state_name)]
WWII_range = c(1934:1950)
WWII<- filter(NMC, NMC$year %in% WWII_range)
alliedP2<- filter(WWII, WWII$ccode %in% allies_ccode)
alliedP2$side = "Allies"
for( i in c(1:length(allies_ccode))){
alliedP2$country[alliedP2$ccode == allies_ccode[i]] <- paste(allies[i], "(Allies)", sep = " ")
}
axisP<- filter(WWII, WWII$ccode %in% axis_ccode)
axisP$side = "Axis"
for( i in c(1:length(axis_ccode))){
axisP$country[axisP$ccode == axis_ccode[i]] <- paste(axis[i], "(Axis)", sep = " ")
}
WWII <-rbind(alliedP2, axisP)
ww2 <- ggplot() +
labs(color ='Country')+
xlab("Year") +
ylab("CNIC")+
geom_rect(data=d, mapping=aes(xmin=1939, xmax=1945, ymin=0, ymax=.4),alpha=0.05, fill ="paleturquoise3") +
geom_line(data = WWII, aes(x = year, y = cinc, color = country, group = stateabb)) +
facet_wrap(~side) +
theme_classic()+
scale_color_brewer(palette="Paired")+
ggtitle("CNIC Score: WWII Major Players ")+
theme(plot.title = element_text(hjust = .5),legend.position="right")
grid.arrange(ww1, ww2, nrow=2)

As mentioned, with WWI the US’s CINC spiked right after the war and then began to steadily decrease till the beginning of the WWII. Russia’s CINC dropped but rose again quickly and stayed on a relatively upward trend till WWII. Unlike the US and Russia, United Kingdom’s CINC was steadily decreasing after the war. Italy and Japan’s CINC remained steady. With the Central Powers after WWI, Germany’s CINC dropped but it did not rise again. Turkey’s, Romania’s and Bulgaria’s CINC remained steady. We see the Austria-Hungarian CINC disappear after the war since the Austro-Hungarian empire was dissolved at the end of the war (Royde-Smith).
With WWII, we see a similar pattern for the US where the CNIC reaches a peak at the end of WWII and steadily decreases till the Korean war. Russia also shows a similar pattern to WWI where its CINC score reaches a low point towards the end of WWII and then continues to steadily increase till the Korean war. The UK also follows a similar pattern where it CINC peaks right after the war and then steadily decreases throughout the Cold War time period. With the Axis powers, Germany’s and Japan’s CINC drops off.
To explore the patterns above, we considered the components that make up the CINC. We chose to focus on the major powers because they had the most drastic changes during this period. We looked at both the actual value and the ratio because the absolute values gradually increased over time but the ratios show performance relative to the other countries each year. Looking at the ratios helped us to see trends that were not easy to spot when looking at the overall values. Below is a plot of five of the six CINC components, the values and the ratios
grid.arrange(arrangeGrob(ias, ias_r, mex, mex_r, mip, mip_r + theme(plot.title = element_text(hjust = .5), legend.position="none"),nrow=3, ncol =2),mylegend, heights=c(10,1))

#grid.arrange(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow=6)
With the Iron and Steel Production ratio, you can see that it follows to same pattern as the CINC for the US during this period. The ratio peaks towards the end of WWI and WWII and decreases in the period between the two wars. Till about the beginning of the Vietnam War, roughly 1955, the US dominated the world in Iron and Steel production and so this value had huge impact the US’s CINC score. Generally, during most wars, the US has the most military production but lost its position toward s the end of the Vietnam war when Russia surpassed the US. The United Kingdom maintained its iron and steel production, but since USA and Russia were increasing their production, the UKs ratio has been steadily decreasing since the 1900s, very similar to that of the pattern observed with the CINC score.
Looking at military expenditure we see that the US and Russia had been significantly investing more in the military throughout the cold war. On the other hand, both countries decrease their military personnel after they peaked at the end of WWII. These findings are consistent with the Cold War where the US and Russia were in an arms race where they heavily invested in military technology but did not engage in any large-scale battles. The US’s military expenditures ratio peaks the same year its CINC score and iron and steel production ratio.
In the few years before WWII, you see Germany’s military expenditures ratio increase quite rapidly and the military personnel ratio saw a drastic increase in the one year before increasing quite rapidly. Although not as drastic, Japan and China follow a similar pattern where the military investment increased significantly a few years prior to WWII and the Korean War, respectively, and the military personnel ratio drastically increased right before the wars. This suggests that in years before the wars, these countries started investing in and preparing their militaries for war. The countries that were on the reactive side, the US, Russia and the UK, their military production ratios and military expenditures ratios only increased during the war.
NMC_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
NMC_ratios_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
#names <- c("North Korea", "South Korea", "Afghanistan" , "Vietnam", "Republic of Vietnam")
#code<- c(731, 732,700, 816, 817)
names <- c("Iraq")
code <- c(645)
NMC_v<- filter(NMC_v, NMC_v$ccode %in% code)
NMC_ratios_v<- filter(NMC_ratios_v, NMC_ratios_v$ccode %in% code)
for( i in c(1:length(code))){
NMC_ratios_v$country[NMC_ratios_v$ccode == code[i] ]<- names[i]
NMC_v$country[NMC_v$ccode == code[i] ]<- names[i]
}
a<- ggplot() +
scale_x_continuous(name="Year") +
scale_y_continuous(name="Ratio") +
geom_line(data = NMC_ratios_v, aes(x = year, y = milex), colour = "indianred3") +
ggtitle("Iraq Military Expenditures Ratio") +
geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.025),alpha=0.03) +
theme_classic() +
theme(plot.title = element_text(hjust = .5, size =10 ), axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")
b<- ggplot() +
scale_x_continuous(name="Year") +
scale_y_continuous(name="Ratio") +
geom_line(data = NMC_ratios_v, aes(x = year, y = milper), colour= "royalblue3") +
ggtitle("Iraq Military Personnel Ratio") +
geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.05), alpha=0.03) +
theme_classic() +
theme(plot.title = element_text(hjust = .5, size =10 ), axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")
grid.arrange(a,b,nrow = 1)

We wanted to see if this this same pattern was present in other conflicts. During the Gulf War, Iraq invaded Kuwait and it was met with international condemnation and the US and other nations joined forces to stop Iraq (Persian Gulf War). But before the invasion in war we see Iraq’s military expenditures and personnel ratios increasing. The gray shaded box indicates the period of the Gulf War (1990-1991). In the 1980s, Iraq’s military expenditure ratio drastically increased and then there was a sudden spike in military personnel right before the start of the war.
ct <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Italy", "Japan", "China")
ct_ccode <- member_alliances$ccode[match(ct, member_alliances$state_name)]
NMC_ct <- filter(NMC, NMC$year %in% all_year)
NMC_ct <- filter(NMC_ct, NMC_ct$ccode %in% ct_ccode)
for( i in c(1:length(ct_ccode))){
NMC_ct$country[NMC_ct$ccode == ct_ccode[i]] <- ct[i]
}
a<- ggplot(NMC_ct, aes(country, year, fill = cinc)) + geom_tile()+
xlab("Country") +
ylab("Year") +
scale_fill_viridis() +
ggtitle("CINC Heatmap for Major Powers") +
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))+
coord_flip() +
labs(color ='CINC')
topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]
NMC_cas <- filter(NMC_orig, NMC$year %in% all_year)
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}
b<- ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
xlab("Country") +
ylab("Year") +
scale_fill_viridis() +
ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") +
theme_classic()+
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip() +
labs(color ='CINC')
grid.arrange(a,b, nrow = 1 )

Finally, using the heat maps, we looked at impact on CINC after a war. On the left is a heat map of the major powers. You can see that France, Germany and Japan had missing CINC values at the end of World War II. Those periods of missing values correspond to the recovery period for each of those countries following the war. Next, we looked to see what other countries had gaps in their CINC data. Most of the countries that have missing gaps are Eastern European countries that where heavily impacted by the Holocaust and by the struggle to end Communism in the region (Royde-Smith). This indicates that for countries that are going through intense destruction or reformation, they do not have any CINC information.
Conclusions & Next Steps
To get even more into detail and context around certain characteristics, we want to look at specific events like pearl harbor or China’s invasion of Malaysia and other pivotal moments in wars to see how those events impact the CINC and its various components.
A problem with NMC is that there are many other factors that determine a power of a nation rather than the 6 NMC factors. One of the major considerations that is not considered is the diplomatic relation between countries. Diplomatic relations play a major role is the prevention and conclusion of conflicts. With this data, it was not possible to factor that in.
Additionally, another thing to consider is differences in policies between different countries. We see that military expenditures have been increasing for the US since the Cold War but Russia’s military expenditures take a sudden drop at the end of the Cold War. Since the end of the Cold War, Russia has been cutting military spending till today (Royde-Smith). Even with its participation in the Afghanistan War, Russia’s military expenditures have not increased. On the contrary, in the US today, politicians are proposing a Federal Budgets with increases in military spending. This difference is due to differences in policies of the countries. Thus, the reactions of countries to events will drastically vary based on their policies and it because hard to distinguish an overall pattern.
Another drawback of NMC is that it cannot consider changes in universal priorities. For example, with an increased concern for climate change and scare natural resources and with advancements in technology, iron and steel production might start to decrease drastically in the future so it may no longer be a valid measure of power. Similarly, advancements in technology would decrease the need for military personnel. The issue with NMC is that it cannot take such policy concerns and changes into consideration to measure national power.
Alliances
dayd_al_year <- filter(dyad_al_year, dyad_al_year$year %in% c(1900:2012))
dyad_al_year$length = dyad_al_year$dyad_end_year - dyad_al_year$dyad_st_year
dayd_al_year$conflict <- "0"
dyad_al_year$count <- 1
dir_alliances <- gather(dir_al_year, treaty_type, idicator, defense:entente)
dir_alliances <- dir_alliances[!dir_alliances$idicator %in% 0,]
dir_alliances$dyad_end_year[dir_alliances$dyad_end_year %in% NA] = 2016
dir_alliances <- dir_alliances[dir_alliances$year>1900,]
alliance_count <- dir_alliances[, c(2,3,14,16)]
alliance_count$count <- 1
gp_ct <- aggregate(cbind(count) ~ ccode1+state_name1+year+treaty_type, data =alliance_count, FUN = sum )
gp_ct$Conflict <- "0"
for(i in c(1:length(d$x1))){
gp_ct[gp_ct$year >= d$x1[i] & gp_ct$year <= d$x2[i], length(gp_ct)] <- as.character(i)
}
WWI was triggered by the assassination of the Archduke Franz Ferdinand of Austria. His death set off diplomatic crisis as countries that were not involved in the original conflict were forced to get involved. Once Austria declared war on Serbia for the death of the Arch Duke, Russia had to step into defend Serbia. Once Russia entered the conflict, Germany was forced to enter the conflict due to its alliance with Austria. During the conflict Germany invaded Belgium; in response, the United Kingdom mobilized due to their alliance with Belgium. This pattern continued to eventually involve all the major powers of the world for a devastating battle. Such alliances were the cause of World War I. Since then the number of Alliances has only grown and continues to grow. For this reason, we wanted to look at alliances and see how they change during wars.
Below is boxplot of the total number of alliances that are in effect each year between any two countries. It is easy to see that the median number of alliances jumped up significantly during WWII and continued to grow during the Cold War and remained relatively level since then. An interesting pattern is that the median number of alliances increased more in the 1-3 before the end of war. You can see this pattern with WWI, Korean War, Vietnam War and the end of the Cold War. Although the Cold War was only a state of severe political war there were many regional battles and the threat of a large-scale military war was constant. The number of alliances significantly increased from the start of the Cold War till the end.
ggplot() +
xlab("Year") +
ylab("Count")+
geom_boxplot(data = gp_ct, aes(x = as.factor(year), y = count, fill = Conflict)) +
ggtitle("Total Alliances by Year") +
theme_classic()+
theme(plot.title = element_text(hjust = .5, size = 20), axis.text.x = element_text(angle = 90, size = 10), axis.text.y = element_text(size = 15), axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15), legend.position="bottom", legend.text = element_text(size=15), legend.title = element_text(size=15))+
scale_fill_manual(values=c("white", "lightsteelblue3", "pink3", "paleturquoise3", "lightsteelblue2", "lightsteelblue4","salmon" ), labels = mylables)

Next we looked at the types of alliances formed during this time. The COW data reports on 4 types of alliances: defense, neutrality, entente and non-aggression. In a defense alliance, the member states agree to defend one or more states in alliance in the event of a conflict. With a neutrality alliance, there is an agreement to maintain neutrality towards the members of the alliance. In non-aggression alliance, the members agree to take no military action against one another. Finally, with an entente alliance there is an understanding that the states would consult with one another if a crisis occurred (Formal Alliances).
The plots below show the number of alliances by alliance type. The top row shows the number of new alliances that were formed each year and the second row shows number of alliances that were terminated that year. Please note that if an alliance was formed between 4 states then there would 6 new alliances in the data set because there is an alliance between each of the 4 members. Similarly, if an alliance between 4 states were terminated that would be 6 less alliances.
dir_al_0 <- filter(dir_al, dir_al$dyad_st_year %in% c(1900:2012))
all_st <- gather(dir_al_0, treaty_type, idicator, defense:entente)
all_st <- all_st[!all_st$idicator %in% 0,]
all_st$dyad_end_year[all_st$dyad_end_year %in% NA] = 2016
al_st_count <- all_st[, c(3,5,8,11,15)]
al_st_count$count <- 1
gp_st <- aggregate(cbind(count) ~ dyad_st_year+treaty_type, data =al_st_count, FUN = sum )
a <- ggplot() +
xlab("Year") +
ylab("Count")+
geom_bar(data = al_st_count, aes(x = dyad_st_year)) +
ggtitle("Total Alliances by Year they started")+
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
ggtitle("Number of Alliances Formed") +
theme_classic()+
theme(legend.position="bottom", plot.title = element_text(hjust = .5)) +
scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1 )
b<- ggplot() +
xlab("Year") +
ylab("Count")+
geom_bar(data = al_st_count[al_st_count$dyad_end_year < 2016, ], aes(x = dyad_end_year)) +
ggtitle("Number of Alliances Terminated")+
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
theme_classic()+
theme(legend.position="bottom", plot.title = element_text(hjust = .5)) +
scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)
grid.arrange(a,b, nrow =2)

Most of the alliances formed were at the end of WWII and during the Cold War. Also, the most frequently formed alliances were defense and entente. Surprisingly, in years that see a large increase in the number of alliances formed there is also an increase in the number of alliances that were terminated. To dig in further to get a better understand of what types of treaties were formed and why they ended, we looked at individual countries.
The following charts are all organized the same way, they show a timeline of when the alliances started till either the end of the alliance (shown in red) or till 2012 if the alliance was observed in effect as of December 31, 2012 (shown in blue). The charts are facetted to show the different types of alliances because many of the alliance types overlap. For example, one alliance could be both a defense and entente alliance, so to get a better visual representation we separated the types of alliances. We focused this part of the analysis on the United States because it is a major power and thus is involved in many of the military alliances throughout history.
United States of America
al_us_yr <- filter(dir_al_year, dir_al_year$state_name1 %in% "United States of America")
al_us_yr <- gather(al_us_yr, Treaty, idicator, defense:entente)
al_us_yr <- al_us_yr[!al_us_yr$idicator %in% 0,]
al_us_yr$dyad_end_year[al_us_yr$dyad_end_year %in% NA] = 2016
al_us_yr <- al_us_yr[, c(1,5,8,11,14,16)]
al_us_yr$count = 1
al_us_yr$Status <- ""
al_us_yr$Status[al_us_yr$dyad_end_year < 2012] <- "Ended"
al_us_yr$Status[al_us_yr$dyad_end_year== 2012] <- "Ongoing"
ggplot() +
xlab("Year") +
ylab("Country")+
geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) +
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) +
ggtitle("US Alliances ")+
theme_classic()+
scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
facet_wrap(~Treaty, nrow = 1 ) +
theme(legend.position="right", plot.title = element_text(hjust = .5))

Looking at the alliances for the US, we see that most defense alliances are still in effect today. There were handful of alliances with South American countries that ended towards the end of WWII but the US entered a different alliance with those same countries immediately. The treaties in effect with the South American countries is the Inter-American Treaty of Reciprocal Assistance (Rio Pact) where if there is an attack against one country, it is considered an attack among all the Americas countries in the alliance. This alliance was created n 1949 and continues till today (The Rio Pact at a Glance).
You can also see a similar pattern of ongoing alliances for defense entente and nonaggression treaty types. NATO, a defensive, entente and nonaggression alliance, was formed in 1947 and is still in effect till today. NATO involves 28 countries and accounts for the high number of alliances formed in 1949 for the 3 types (Formation of NATO).
The entente alliances follow a similar pattern where the alliance ended and was immediately reformed. There were a few countries where there was an entente alliance formed towards the end of the Korean War and ended a few years after the end of the Vietnam war. The majority of the countries that follow the described patters are in Asia or Australia. This is reasonable considering they were participants is the Vietnam War. Also, the defense and entente alliance between the US and Cuba ended during the Vietnam War, indicated in the graph above, when Cuba was providing military support to the Vietnamese. Also, during the Vietnam war, there was a neutrality alliance for a few years between the US and countries that participated in the Vietnam war. This alliance was called the International Agreement on the Neutrality of Laos starting in 1961 and was terminated when was Democratic Republic of Vietnam violated the terms of the treaty 2 years later (Vietnam War History).
Conclusions & Next Steps
Since we are mainly focusing on large-scale wars that involved various countries, many treaties were created and broken. For example, Warsaw Pact was created as a counter weight to the NATO Pact created at end of WWII. The US, Great Britain and their allies became part of NATO and the Soviet Union and its Allies became part of the Warsaw pact. Once the USSR dissolved many of the satellite nations, the Warsaw Pact members joined NATO. With NATO and the formation of the United Nations, it is hard to say which countries will participate in the next war. For example, before the beginning of the War on Afghanistan, the security council had to authorize the United States and NATO allies to organize an offensive against al-Qaeda (Witte). This type of regulation makes it hard to determine how future wars will play out. One thing that was interesting is that once a treaty falls apart, the members try to join another treaty which is why we see spikes is the median number of alliances towards the end of the wars.
One of the things that was hard to work with this data set is that it was impossible to tell which alliances were part of a larger treaty. For example, if there was a data point for an alliance between the US and the UK in 1967, there was no indication of if it was NATO or some other treaty. This also made it hard to tell when a country joined an existing alliance. For example, when Germany joined NATO there were data points for an alliance between Germany and the NATO members but it is not easy to discern that Germany join NATO without some internet research.
The other downfall of this data set is that it only considers formal military alliances. It does not consider other types of alliances such as the United Nations & security council or a trade agreement. For example, Japan is not in any military alliance currently but it does have very close ties to the United States today, and this information is not captured in the data set.
Militarized Interstate Disputes
The data is rich and contains many dimensions such as the outcomes, settlements, the number of fatalities, minimum duration, the highest action taken and hostility level during each militarized conflict in the last century. Hence as the first step, we selected a few variables and plotted them in a pcp to try and spot correlations.
library(ggplot2)
library(dplyr)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(GGally)
Attaching package: ‘GGally’
The following object is masked from ‘package:dplyr’:
nasa
#path = "/home/vaguiar/col_hw/vis_hw/final/data/"
MIDA = read.csv(file="./data/MID/MIDA_4.01.csv", sep= ",")
war_year <- function(x){
if(x >= 1914 & x <= 1918)
return('WWI')
if(x >= 1939 & x <= 1945)
return('WWII')
if(x >= 1950 & x <= 1953)
return('Korean War')
if(x >= 1955 & x <= 1975)
return('Vietnam War')
if(x >= 1947 & x <= 1991)
return('Cold War')
if(x >= 2001 & x <= 2010)
return('War in Afghanistan')
else
return('No War')
}
MIDA$wartime <- sapply(MIDA$StYear, war_year)
##PCP PLot
alphabending = 0.5
war <- ggparcoord(MIDA[MIDA$StYear>1900 & MIDA$wartime!='No War',], columns = c(9:11, 14:16),
scale = "uniminmax",
alphaLines = alphabending,
groupColumn = "wartime",
title="Correlations In War Time Conflicts") +
theme_classic() +
theme(legend.position = "bottom")
#guides(fill=guide_legend(title="War Period",
# title.position = "bottom",
# nrow=1))
peace <- ggparcoord(MIDA[MIDA$StYear>1900 & MIDA$wartime=='No War',], columns = c(9:11, 14:16),
scale = "uniminmax",
alphaLines = alphabending,
groupColumn = "wartime",
title="Correlations In Peace Time Conflicts") +
theme_classic() +
theme(legend.position = "bottom")
#guides(fill=guide_legend(title="War Period",
# title.position = "bottom",
# nrow=1))
grid.arrange(war, peace, nrow = 2)

In the graph plotting War Time Correlations, we see that most of the data points for Fatality, Settlement and Outcome gravitate towards the top half. There’s also a high level of correlation between those three. Minimum Duration was partly skewed and the rest were evenly distributed. As for the Peace Time Correlations, the first three variables seem pretty evenly distributed too.
Below, we dig a little deeper into variables like Fatalities, Settlements and Outcomes of the conflicts. Each graph is overlaid with the time period of the major wars. Through this exploration, we try to spot possible predictors of a war.
Fatalaties In Disputes Leading to War
# Overlay w/ rectangle theme
y2_high =9
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))
#Adding Labels for Facet titles
facet_names <- as_labeller(c(
'0' = "None",
'1' = "1-25 deaths",
'2' = "26-100 deaths",
'3' = "101-250 deaths",
'4' = "251-500 deaths",
'5' = "501-999 deaths",
'6' = "More than 999 deaths",
'-9' = "Missing Data"
))
ggplot() +
geom_bar(data=MIDA[MIDA$EndYear>1900 & MIDA$Fatality!=0 & MIDA$Fatality!=-9,], aes(x = EndYear), stat="count") +
facet_wrap(~Fatality, nrow= 8, labeller=facet_names) +
scale_fill_manual(values = myPal) +
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
ggtitle("Number Of Disputes With Fatalities By Year") +
xlab("End Year") +
ylab("Count")

In the graph above we have plotted the Fatalities occurring due to militarized conflicts every year since the year 1900. We’ve faceted the graphs by the number of fatalities starting from 1-25, 25-100, 101-250 and so on. Within each facet, the data is mildly interspersed with a larger concentration during the war years and sparse distribution at other times. But, interestingly so, there were conflicts resulting in more than 999 deaths in the years leading to the each and every major war. And in four of those years, namely 1913, 1938, 1955, and 2001, we see spikes in the number of deaths. This could be indicative of the fact that those conflicts forced an international involvement in the wars.
Settlement of Disputes Leading to War
# Overlay w/ rectangle theme
y2_high = 9
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))
#Adding Labels for Facet titles
facet_names <- as_labeller(c(
'1' = "Negotiated",
'2' = "Imposed",
'3' = "None",
'4' = "Unclear",
'-9' = "Missing Data"
))
ggplot() +
geom_bar(data=MIDA[MIDA$EndYear>1900 & MIDA$Settle!=3 & MIDA$Settle!='-9', ], aes(x = EndYear), stat="count") +
facet_wrap(~Settle, nrow= 5, labeller=facet_names) +
scale_fill_manual(values = myPal) +
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
ggtitle("Number Of Disputes With Settlement Outcomes By Year") +
xlab("End Year") +
ylab("Count")

In the graph above we have plotted the Settlement outcomes of different militarized conflicts every year since the year 1900. We’ve faceted the graphs by the Settlement types and for simplicity we’ve left out missing data or settlements that were unclear.
Surprisingly, we don’t see any correlations between Imposed/Negotiated Settlements and the war. But on second thought, the number of Settlements, would not be an important predictor of war. This could be due to the fact that some Settlements, such as such as the Treaty of Versailles [Treaty of Versailles] which was punitive and complex in nature, had a major role in shaping the political climate of Nazi Germany. But since the data doesn’t delve into the qualitative aspects of these Settlements and it is of little or no significance to our exploration.
Outcomes of Disputes Leading to War
# Overlay w/ rectangle theme
y2_high =65
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))
MIDA_o <- filter(MIDA, MIDA$Outcome %in% c(1,2,3,4,5,6,8,-9))
war_outcome <- function(x){
if(x ==2)
return(1)
if(x == 4)
return(3)
else
return(x)
}
MIDA_o$WarOutcome <- sapply(MIDA_o$Outcome, war_outcome)
#Adding Labels for Facet titles
facet_names <- as_labeller(c(
'1' = "Victory For Either Side",
'3' = "Yield By Either Side",
'5' = "Stalemate",
'6' = "Compromise",
'8' = "Unclear",
'-9' = "Missing Data"
))
ggplot() +
geom_bar(data=MIDA_o[MIDA_o$EndYear>1900 & MIDA_o$WarOutcome!='-9' , ],
aes(x = EndYear), stat="count") +
facet_wrap(~WarOutcome, nrow= 5, labeller=facet_names) +
scale_fill_manual(values = myPal) +
geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
ggtitle("Number Of Disputes With Outcomes By Year") +
xlab("End Year") +
ylab("Count")

In the graph above we have plotted the Outcomes of different militarized conflicts every year since the year 1900. We’ve faceted the graphs by the Outcomes and for simplicity, we’ve left out missing data or outcomes that were unclear. There is little evidence to show that certain types Outcomes were predictive of a war. But surprisingly so, we see that Stalemates peaked during the middle of an ongoing war. We see that in WWII, the Vietnam War and during War in Afghanistan. On further analysis, the years that correspond to the spike in stalemates in WW2 and the Vietnam War were the exact years in which the US entered both those wars; namely 1942 [The United States Declares War on Japan] and 1964 [Gulf of Tonkin Resolution]. The introduction of a major military power could explain the shift in the power.
Conclusions & Next Steps
The face of modern warfare has changed dramatically in the past few decades. With technological advancements, militaries now resort to elite electronic warfare which enables them to target their enemies without any human presence on site. Cyber war is also widely employed to conduct espionage or to influence biases in the world. Countries maintain huge arsenals of strategic defense weapons such as Intercontinental Ballistic Missiles [ICBM] and Ballistic Missile Nuclear Submarines [SSBN] to serve as important deterrents to international conflicts. And learning from the past ravages of war, many countries stand united in collectively punishing rogue nations through trade embargoes or economic maneuvers. With such a shift in military strategy, the MID data may contain newer dimensions in the future. The involvement of certain actors may be harder to prove or pin point and the existing variables may not serve as valid predictors of an imminent war.
But for the next steps of this exploration, we could work to consider the interplay of trade and economic reliance on the peace between borders. Through the MIDB data set we could drill down into the individual disputes between countries and their occurrences over time. Superimpose that over economic or trading ties during the same time periods, or the forming of other international alliances and we could have valuable findings.
Sources
“About the Correlates of War Project.” Correlates of War. N.p., 05 Apr. 2014. Web. 19 Apr. 2017.
“Battlefield: Vietnam.” PBS. Public Broadcasting Service, n.d. Web. 19 Apr. 2017.
“Cold War.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017.
Formal Alliances (v4.1). Gibler, Douglas M. 2009. International military alliances, 1648-2008. CQ Press.
“Formation of NATO.” History.com. A&E Television Networks, 2010. Web. 19 Apr. 2017.
“Gulf of Tonkin Resolution.” Wikipedia. Wikimedia Foundation, 18 Apr. 2017. Web. 19 Apr. 2017.
“ICBM - Intercontinental Ballistic Missile.” Wikipedia. Wikimedia Foundation, 19 Apr. 2017. Web. 20 Apr. 2017.
Jones, Daniel M., Stuart A. Bremer and J. David Singer. 1996 .“Militarized Interstate Disputes, 1816-1992: Rationale, Coding Rules, and Empirical Patterns.” Conflict Management and Peace Science 15:163-213.
National Material Capabilities (v5.0). Singer, J. David, Stuart Bremer, and John Stuckey. (1972). “Capability Distribution, Uncertainty, and Major Power War, 1820-1965.” in Bruce Russett (ed) Peace, War, and Numbers, Beverly Hills: Sage, 19-48.
Palmer, Glenn, Vito D’Orazio, Michael Kenwick, and Matthew Lane. 2015. “The MID4 Data Set: Procedures, Coding Rules, and Description.” Conflict Management and Peace Science. Forthcoming.
“Persian Gulf War.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017.
“The Rio Pact at a Glance.” The New York Times. The New York Times, 20 Apr. 1982. Web. 19 Apr. 2017.
Royde-Smith, John Graham. “World War I.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., 09 Dec. 2016. Web. 19 Apr. 2017.
Royde-Smith, John Graham. “World War II.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., 03 Feb. 2017. Web. 19 Apr. 2017.
“SSBN - Ballistic Missile Submarine.” Wikipedia. Wikimedia Foundation, 08 Apr. 2017. Web. 20 Apr. 2017.
“The United States Declares War on Japan.” History.com. A&E Television Networks, n.d. Web. 19 Apr. 2017.
“Vietnam War History.” History.com. A&E Television Networks, 2009. Web. 19 Apr. 2017.
Witte, Griff. “Afghanistan War.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., 14 Oct. 2016. Web. 19 Apr. 2017.
“Treaty of Versailles.” Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017.
---
title: "Tale of War"
output: html_notebook
author: "Cynthia Clement & Vineet Aguiar"
---
```{r, echo = FALSE}

## Load Packages 

library(ggplot2)
library(grid)
library(gridExtra)
library(tidyr)
library(dplyr)
library(viridis)
library(gtools)
library(RColorBrewer)


##import data

#NMC

NMC_orig = read.csv("./data/NMC_5_0/NMC_5_0.csv", sep= ",")

#Alliances 
dyad_al_0= read.csv("./data/version4.1_csv/alliance_v4.1_by_dyad.csv", sep= ",")
dyad_al_year = read.csv("./data/version4.1_csv/alliance_v4.1_by_dyad_yearly.csv", sep= ",")

member_alliances = read.csv("./data/version4.1_csv/alliance_v4.1_by_member.csv", sep= ",")
member_al_year = read.csv("./data/version4.1_csv/alliance_v4.1_by_member_yearly.csv", sep= ",")

dir_al_year= read.csv("./data/version4.1_csv/alliance_v4.1_by_directed_yearly.csv", sep= ",")
dir_al= read.csv("./data/version4.1_csv/alliance_v4.1_by_directed.csv", sep= ",")

## Events Data Frame 

d=data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))

mylables <- c("No War" , "WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War")
```


##Introduction

Wars are complex events born of geopolitical, cultural or economic strife, often spanning many years but ultimately costing lives, livelihood and peace. During wars, countries quickly adopt ideologies, form allegiances, and discipline their economic and scientific priorities while maintaining their military focus with a blind adherence. Although the causes of this displacement of peace may vary, is there a precursory pattern to it? Does the landscape change after the end of a prolonged conflict? Do certain actors benefit more? Do some lose more than others? And most importantly, could there be important predictors of these epic events that change the course of history? 

We are particularly interested in studying the changes that happen to a country before and after they enter a war. We want to see the change of alliances and strategies, it's impact on trade and commerce and the economics at play. We also want to compare and contrast the characteristics of countries who won wars with the ones that lost. Our eventual goal is to find certain factors that indicate which countries will enter a war and how these factors and predictors change over time. 

To limit our scope, we will explore the data with a particular emphasis on the United States of America and the wars it has fought since 1900. At various points, we may have to include comparisons between countries and the US and we will explore the data breadth-wise to draw meaningful insights.  

For our timeline, we plan to look at events/activity leading to, during and following the major US wars, namely;  
WWI    -------------------------- 1914-1918  
WWII  ------------------------- 1939-1945  
Cold War -------------------- 1947-1991  
Korean War ---------------- 1950-1953  
Vietnam War --------------- 1955-1975  
War in Afghanistan ------- 2001-2010 

Although the Cold War was only a state of severe political tension between the Eastern Bloc and Western Bloc there were many regional battles and the threat of a large-scale military war was constant. We chose to include the Cold War in our analysis to see if there were any differences for a threat of war compared to an actual war. 


## The Data, Team Members and The Roles

The [Correlates of War Project](http://www.correlatesofwar.org) is a treasure trove of information. We have a special interest in the following datasets: Trade, National Materials Capabilities (NMC), Alliances and Militarized Interstate Disputes (MID)

**Our Plan**  
We've decided to divide and conquer the work by each taking a subset of the data and exploring it. After sometime, we will regroup to see what we've learnt so far and switch data sets amongst ourselves to see if there are more insights to be learnt or different approaches to visualize the existing data. Lastly, we want to drill down into particular variables and plot correlations or predictors for the final output.

*Phase 1:*  
* Cynthia to analyze National Materials Capabilities and Alliances  
* Vineet to analyze Militarized Interstate Disputes and Trade  

*Phase 2:*  
* We are going to switch the data sets we are looking at to see if the other person can discern any new insights or creative ways of presenting the data.  
  + Cynthia to analyze Militarized Interstate Disputes and Trade  
  + Vineet to analyze National Materials Capabilities and Alliances   


## Analysis of Data Quality 

The Correlates of War datasets are a product of the Correlate of War Project (COW) founded in 1963. COW's goal is to "facilitate the collection, dissemination, and use of accurate and reliable quantitative data in international relations." [Correlates of War Project] From the COW datasets we focused on 4 datasets: National Materials Capabilities, Militarized Interstate Disputes, Alliances and Trade. As these data sets are part of the COW project, the authority, accuracy and objectivity of the data sets are impeccable. All the data sets use a field called country code, a numeric ID given to each country. This is a consistent field that can be used to tie the data across the data sets in the COW project. 

**National Materials Capabilities**
  
The overall data quality of NMC dataset is very good. There are roughly 14,000 entries and 89% of them did not have any missing values. There is one data entry for each country per year. The accuracy of the data is also very good because as countries are dissolved and new ones are formed, this data keeps track of them. For example, the graph depicts the CINC of Austria-Hungary from 1900-1918, the end of WWI when it the Austro-Hungarian Empire was dissolved. Immediately after that, you see data points for Austria and Hungary separately. This same accuracy hold true for many different countries, where there is only data once the country has declared independence or has just been created. 

```{r fig.width=10, fig.height=5}
NMC_test <- NMC_orig
NMC_test$cinc[NMC_test$cinc == -9] <- NA
NMC_test$irst[NMC_test$irst == -9] <- NA
NMC_test$milex[NMC_test$milex == -9] <- NA
NMC_test$milper[NMC_test$milper== -9] <- NA
NMC_test$pec[NMC_test$pec == -9] <- NA
NMC_test$tpop[NMC_test$tpop == -9] <- NA
NMC_test$upop[NMC_test$upop == -9] <- NA

NMC_test <- filter(NMC_test, NMC_test$year %in% c(1900:2007))
test <- c("Austria-Hungary", "Austria", "Hungary")
test_ccode <- member_alliances$ccode[match(test, member_alliances$state_name)]

NMC_test <- filter(NMC_test, NMC_test$ccode %in% test_ccode)

ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country Abbreviation')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=.09, fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_test, aes(x = year, y = cinc, color = stateabb, group = stateabb)) +
  ggtitle("CINC for Austria-Hungary") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 

```
  
Below is heat map of CINC values for a few countries and we can see that there are missing CINC values for certain countries. More specifically, this the heat map below shows the CINC values of countries that were destroyed by both the Holocaust and the fight for freedom from Communism. These countries had to go through a period of major reformation and recovery. In the data set, anytime that a country has been devastated it has missing CINC values.  


```{r, fig.align='center'}
topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]


NMC_cas <- filter(NMC_orig, NMC_orig$year %in% c(1900:2012))
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
  NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}

ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip() +
  labs(color ='CINC')
  

```


**Alliances**


There are 6 iterations of the same data, each serving a different purpose. The Alliances contains the alliance dataset distributed by member, with one observation for each alliance member initiation. There is also another file with same data but includes an additional field that has an observation for each year the member is in the alliance. There is another data set with one observation for each dyadic alliance initiation. This data set also list the details of the alliances. As with the alliances data set, the dyadic alliances data set contains another file that has an observation for each year the member is in the alliance. Finally, there is the directed dyad data set that has 2 observations per alliance, where it details the terms of the alliance of country A towards country B, then another point to details the terms of agreement of country B towards country A. As with the other alliances data sets, there is another directed dyad data set with one observation for each year the alliance is in effect. From all the data we have looked at it in our analysis, the data shows information that is in line with historical events. 

Some of the inconsistencies that I noticed are in cases where the alliance is still in effect as of the 12/31/2012, which was when this data set was last updated. In some of the datasets, if the alliance is ongoing, it would have the dyad_end_year, the field that represents the year in which the alliance was terminated, set to 2012 and in other data sets it would have it set as ‘NA’. In the case that dyad_end_year is set to 2012, it was hard to know if there were any alliances that ended in 2012 or if they were an ongoing alliance.


**Militarized Interstate Disputes (MID)** 

The MID data is thorough and well collated. As conflicts or wars can have multiple dimensions to them, the MID data is broken down into 2 main data sets. MIDA contains an observation per conflict with details about the start and end of the conflict, the fatalities, the outcomes, the settlements, the highest action taken etc. The other part of the dataset- MIDB, contains one observation per actor in a conflict, thus adding a different dimension to the data.


```{r fig.width=15, fig.height=6, echo = FALSE}
library(grid)
library(gridExtra)

MIDA = read.csv(file="./data/MID/MIDA_4.01.csv", sep= ",")

fatal <- ggplot() + geom_histogram(data = MIDA[MIDA$Fatality!=-9, ], aes(Fatality)) 
outcm <- ggplot() + geom_histogram(data = MIDA[MIDA$Outcome!=-9, ], aes(Outcome))
hiact <- ggplot() + geom_histogram(data = MIDA[MIDA$HiAct!=-9, ], aes(HiAct))
hostl <- ggplot() + geom_histogram(data = MIDA[MIDA$HostLev!=-9, ], aes(HostLev))
  
grid.arrange(fatal, outcm, hiact, hostl, nrow = 2)  

```
As seen above, the authors of the dataset have used techniques to categorize certain subjective variables and then futher factor them to the number format. For instance, the "Outcome" of a dispute was factored as follows [Palmer, Glenn]:  
Victory for side A: 1  
Victory for side B: 2  
Yield by side A: 3  
Yield by side B: 4  
Stalemate: 5  
Compromise: 6  
Released: 7  
Unclear: 8  
Joins ongoing war: 9  
Missing: -9  
That created a conformity among all recorded obervations which made it easier to work with.  
Another arrangement that made the data effective was that certain datapoints were clearly marked unclear or missing and based on the context of the graph, we were able to easily sift through them.  
  
## Executive Summary 
  
**National Materials Capabilities**

National Materials Capabilities is measure of the power of nation based on 6 values: iron and steel production, military expenditures, military personnel, primary energy consumption, urban population and total population. It is also a measure of the military and economic power a nation can have. We also looked at the ratios of the 6 metrics, calculated by summing the total values of all countries for a given year and dividing the value for each country by the total. The ratios helped us to normalize trends to look and how countries change their behavior, relative to other countries.



```{r, echo = FALSE}
countries <- c("Germany")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]

NMC <- NMC_orig 
NMC$cinc[NMC$cinc == -9| is.na(NMC$cinc)] <- 0
NMC$irst[NMC$irst == -9| is.na(NMC$irst)] <- 0
NMC$milex[NMC$milex == -9| is.na(NMC$milex)] <- 0
NMC$milper[NMC$milper== -9| is.na(NMC$milper)] <- 0
NMC$pec[NMC$pec == -9| is.na(NMC$pec)] <- 0
NMC$tpop[NMC$tpop == -9| is.na(NMC$tpop)] <- 0
NMC$upop[NMC$upop == -9| is.na(NMC$upop)] <- 0

all_year <- c(1900:2007)

NMC_ratios <- c("")
for (year_t in all_year){
  yr <- filter(NMC, NMC$year %in% year_t) 
  max <- apply(yr[, c(4:9)], 2, sum)
  #max <- as.numeric(max[4:9])
  for (i in 4:9){
    yr[,i] = as.numeric(yr[,i]/max[i-3])
  }
  NMC_ratios <- smartbind(NMC_ratios , yr)
}

NMC_ratios  <- NMC_ratios [c(2:nrow(NMC_ratios )), c(2:length(NMC_ratios))]

NMC_execsum_ratios <- filter(NMC_ratios, NMC_ratios$year %in% c(1900:1950))
NMC_execsum_ratios <- filter(NMC_execsum_ratios, NMC_execsum_ratios$ccode %in% country_code)
NMC_execsum_ratios$country = ""
for( i in c(1:length(country_code))){
  NMC_execsum_ratios$country[NMC_execsum_ratios$ccode == country_code[i]] <- countries[i]
}

names(NMC_execsum_ratios) <- c("stateabb", "ccode" ,  "year",     "Expenditures"  , "Personnel",   "irst"   , "pec",      "tpop"  ,"upop",     "cinc" ,  "version",  "country")

d2=data.frame(x1=c(1914,1939), x2=c(1918, 1945), Conflict=c("WWI", "WWII"), r=c(1,2))

NMC_execsum_ratios <- gather(NMC_execsum_ratios, Type, Ratio, Expenditures:Personnel )

NMC_execsum_ratios_1 <- filter(NMC_execsum_ratios, NMC_execsum_ratios$year %in% c(1912:1918))  
NMC_execsum_ratios_1$Reriod <- "Reactive Investment"
NMC_execsum_ratios_2 <- filter(NMC_execsum_ratios, NMC_execsum_ratios$year %in% c(1932:1941)) 
NMC_execsum_ratios_2$Period<- "Proactive Investment"

 ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") +
  geom_rect(data=d2, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=.5, fill=Conflict),alpha=0.2) +
  geom_line(data = NMC_execsum_ratios, aes(x = year, y = Ratio), size = .5) +
  geom_line(data = NMC_execsum_ratios_1, aes(x = year, y = Ratio), size = .5, color = "palegreen3") +
  geom_line(data = NMC_execsum_ratios_2, aes(x = year, y = Ratio), size = .5, color = "orangered2") +
  ggtitle("Germany: Military Investment Ratios") + 
  theme_classic() +
  scale_fill_manual(values=c("lightsteelblue3", "pink3"))+
  theme(plot.title = element_text(hjust = .5),legend.position="bottom") + facet_wrap(~Type)


```

Above is a graph of Germany’s military expenditures ratio and military personnel ratio from the 1900 to 1950. We found is that the military investment of a country that instigates a conflict vs a country that reacts to aggression is very different. Germany is a good example because Germany was a reactive participant in WWI, only engaging once Austria-Hungary declared war on Serbia and an instigator in WWI, when it invaded Poland officially starting WWII.

We found that for a reactive participant, the military expenditures ratio and military personnel ratios increase during the war, not before. The part of the graph highlighted in green shows Germany increase its military investment during WWI. There was not much indication of an increase in investment before the war started. On the other hand, the area highlighted in red before WWII, shows Germany proactively increasing military investment. There is sharp increase in military expenditures in the 10 years before WWII and an even sharper increase in military personnel in the 1-2 years before WWII. We found this pattern with several countries that instigated wars. Although there are many factors that determine if a country will instigate a war, monitoring the military investment can assist in determining the actions of a country in the near future.  
 

**Alliances**

Alliances between nations are very important in determining the course of events. World War I can be attributed to the various alliances in place that created a diplomatic chain of events to involve all the world powers at that time. Below is a series of boxplots to show the number of alliances in effect per year since 1900s; the colored box plots show the different events listed above. We see that the median number of alliances per country increase towards the end of each war, with the exception of the Afghanistan War.  We have found that alliances are a reactive to wars. For example, the North Atlantic Treaty Organization(NATO) and the Warsaw Pact were treaties created towards the end of WWII. Both alliances involved numerous countries and account for the increase the number of alliances towards the end of WWII and the beginning of the Cold War. We also found that once a military alliance is terminated or dissolved, countries try to join another treaty as quickly as possible. Towards the end of the Cold War, the Soviet Union dissolved and with it the Warsaw Pact. Once that ended, many countries joined NATO which accounts for the increase in the alliances towards the end of the Cold War. Overall, looking at alliances is not a good predictor of events to come because there are a lot of diplomatic events going on in the background that is not captured in the data. Additionally, today the United Nations has a lot of control and regulation over wars that happen today, so looking at alliances will not be a sufficient to determine future events. 

  
```{r, echo = FALSE}

dayd_al_year <- filter(dyad_al_year, dyad_al_year$year %in% c(1900:2012))
dyad_al_year$length = dyad_al_year$dyad_end_year - dyad_al_year$dyad_st_year

dayd_al_year$conflict <- "0"
dyad_al_year$count <- 1 

dir_alliances <- gather(dir_al_year, treaty_type, idicator, defense:entente)
dir_alliances <- dir_alliances[!dir_alliances$idicator %in%  0,]
dir_alliances$dyad_end_year[dir_alliances$dyad_end_year %in% NA] = 2016
dir_alliances <- dir_alliances[dir_alliances$year>1900,]
alliance_count <- dir_alliances[, c(2,3,14,16)]
alliance_count$count <- 1 

gp_ct <- aggregate(cbind(count) ~ ccode1+state_name1+year+treaty_type, data =alliance_count, FUN = sum )
gp_ct$Conflict <- "0"

for(i in c(1:length(d$x1))){
  gp_ct[gp_ct$year >= d$x1[i] & gp_ct$year <= d$x2[i], length(gp_ct)] <- as.character(i)
}

```
  
```{r fig.width=20, fig.height=8, echo = FALSE}
ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_boxplot(data = gp_ct, aes(x = as.factor(year), y = count, fill = Conflict)) +
  ggtitle("Total Alliances by Year") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = .5, size = 20), axis.text.x = element_text(angle = 90, size = 10),  axis.text.y = element_text(size = 15),  axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15),  legend.position="bottom", legend.text = element_text(size=15),  legend.title = element_text(size=15))+
  scale_fill_manual(values=c("white", "lightsteelblue3", "pink3", "paleturquoise3", "lightsteelblue2", "lightsteelblue4","salmon" ), labels = mylables)

```



**Militarized Interstate Disputes**  
In the years leading to a war, certain countries increased their military arsenal, recruited more personnel and tested out their weapons and warfare strategies on a smaller scale with a feeble opponent. Though our exploration of Militarized Interstate Disputes, we see exactly that. For instance, in the years leading to WWII, Italy and Germany had initiated the most number of disputes. Clearly all the preparation and war-readiness prepared them for larger scale encounters and against more formidable foes. This graph below plots the most belligerent countries each year against the number of conflicts that it had initiated. The information is superimposed with different war periods. To reduce clutter, we’ve omitted observations below six conflicts each year. As seen above, the top initiators of conflicts had a ready predisposition towards an escalation or impending war. Once we delved into the different dimensions of the data, we paid special attention to the boundary years surrounding a war. Based on our observations, we could tell that the number of fatalities in conflicts during the years before a war was indicative of the fact that a international involvement in a war seemed imminent.

```{r fig.width=15, fig.height=7, echo=FALSE}
library(ggrepel)

MIDB = read.csv(file="./data/MID/MIDB_4.01.csv", sep= ",")

#Filter by countries that originated disputes.
MIDB_O <- filter(MIDB, Orig==1)

#Count the conflicts originated by countries each year
WarOrig <- MIDB_O %>% count(StAbb, sort=TRUE, vars=StYear)

#Top Conflicts stared  Countries by Year
start_yr <- 1901
end_yr <- 2010

#start_yr <- 1938
#end_yr <- 1942
for(yr in start_yr:end_yr)  
{
  #Count of largest number of conflicts that year.
  top_o <- head(arrange(filter(WarOrig, vars==yr), desc(n)), 1)[1, 3]

  #Select rows of all countries with the top count of conflicts.
  WarOrig_yr <- filter(WarOrig, vars==yr & n==top_o$n[[1]])
    
  if(yr == start_yr)
    WarTopOrig <- WarOrig_yr
  else
    WarTopOrig <- rbind(WarTopOrig, WarOrig_yr)
}

WarTopOrig <- arrange(WarTopOrig, desc(vars))
#print(WarTopOrig)

# Overlay w/ rectangle theme
y2_high =30
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))


#Using distinct color palette
cols <- colorRampPalette(brewer.pal(12, "Paired"))
myPal <- cols(length(unique(WarTopOrig$StAbb)))

confictsInitiated = 5
ggplot() + 
  geom_point(data=WarTopOrig, aes(x = vars, y = n), size = 1, color = 'red') +
  geom_label_repel(data=WarTopOrig,
    aes(x = vars, y = n, 
        label= ifelse(n>=confictsInitiated, as.character(StAbb), ''),
        fill = ifelse(n>=confictsInitiated, as.character(StAbb), 'WWI')),
        #fill = factor(StAbb)),
      fontface = 'bold', color = 'white',
      box.padding = unit(0.35, "lines"),
      point.padding = unit(0.5, "lines"),
      segment.color = 'grey50') +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15, show.legend = TRUE) +
  theme_classic(base_size = 16) +
  ggtitle("Countries Initiating Most Number Of Conflicts Each Year") +
  xlab("Year") +
  ylab("Count") +
  theme(legend.position = "bottom") +
  guides(fill=guide_legend(title="War Period", 
                           title.position = "left",
                           nrow=1)) +
  scale_fill_manual( 
    breaks= c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), values= myPal)
  
```

  
## Main Analysis 

####National Materials Capabilities

```{r}
NMC <- NMC_orig 
NMC$cinc[NMC$cinc == -9| is.na(NMC$cinc)] <- 0
NMC$irst[NMC$irst == -9| is.na(NMC$irst)] <- 0
NMC$milex[NMC$milex == -9| is.na(NMC$milex)] <- 0
NMC$milper[NMC$milper== -9| is.na(NMC$milper)] <- 0
NMC$pec[NMC$pec == -9| is.na(NMC$pec)] <- 0
NMC$tpop[NMC$tpop == -9| is.na(NMC$tpop)] <- 0
NMC$upop[NMC$upop == -9| is.na(NMC$upop)] <- 0

all_year <- c(1900:2007)

NMC_ratios <- c("")
for (year_t in all_year){
  yr <- filter(NMC, NMC$year %in% year_t) 
  max <- apply(yr[, c(4:9)], 2, sum)
  #max <- as.numeric(max[4:9])
  for (i in 4:9){
    yr[,i] = as.numeric(yr[,i]/max[i-3])
  }
  NMC_ratios <- smartbind(NMC_ratios , yr)
}

NMC_ratios  <- NMC_ratios [c(2:nrow(NMC_ratios )), c(2:length(NMC_ratios))]


cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

```


National Materials Capability measures the power of a country based on 6 values: total population, urban population, military personnel, military expenditures, iron and steel production and energy consumption. NMC is purely a measure of military and economic means of influence rather than diplomacy or other forms of influence. 

CINC is the composite score to measure the power of a country using the average of the ratios of each country value to the total value of all countries for each of the 6 factors.   Below is the CNIC score for major powers today who also participated the major wars in the past. Also highlighted are the period wars mentioned above.


```{r fig.width=10, fig.height=5}
countries <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Japan", "China")
country_code <- member_alliances$ccode[match(countries, member_alliances$state_name)]

NMC_mp <- filter(NMC, NMC$year %in% all_year)
NMC_mp <- filter(NMC_mp, NMC_mp$ccode %in% country_code)
NMC_mp$country = ""
for( i in c(1:length(country_code))){
  NMC_mp$country[NMC_mp$ccode == country_code[i]] <- countries[i]
}

ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="CINC") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$cinc), fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_mp, aes(x = year, y = cinc, color = country, group = stateabb)) +
  ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5), legend.position = "bottom") 
  



```


As you can see CNIC is constantly changing. Interestingly, the CNIC spikes for the US right after WWI and WWII where the US had a major role in the outcome of the wars. There is also an increase in the US’s CINC score during the Korean War. Following the Korean War, the US’s CINC score shows a continuous decrease during the Vietnam where the US lost the war.  At the end of WWI, you see Russia’s CNIC dip low but it bounced back to its CNIC before WWI quickly. During the Vietnam War, Russia supported the Vietnamese people and you see the its CINC score increase above that of the US as they start to gain ground in the war. Also, Russia’s CINC score drops low towards the end of the Cold War as the satellite states start gaining their independence and the USSR was dissolved (Battlefield: Vietnam).

To explore the trends more, the data was refined to only look at the major participants of each war. For example, with WWI we looked at the CINC values for major players in the Allied Powers and Central Powers a few years before and after the war. We replicated this process for all the events listed above. Below are CINC graphs using this approach for WWI and WWII.



```{r fig.width=10, fig.height=5, Echo = TRUE}
allied <- c("United States of America", "United Kingdom", "Russia", "Japan", "Italy")
allied_ccode <- member_alliances$ccode[match(allied, member_alliances$state_name)]

central <- c("Germany", "Turkey", "Austria-Hungary", "Romania", "Bulgaria")
central_ccode <- member_alliances$ccode[match(central, member_alliances$state_name)]

WWI_range = c(1904:1930)

WWI<- filter(NMC, NMC$year %in% WWI_range)
alliedP<- filter(WWI, WWI$ccode %in% allied_ccode)
alliedP$side = "Allied Powers"
for( i in c(1:length(allied_ccode))){
  alliedP$country[alliedP$ccode == allied_ccode[i]] <- paste(allied[i],"(Allied)", sep = " ")
}

centralP<- filter(WWI, WWI$ccode %in% central_ccode)
centralP$side = "Central Powers "
for( i in c(1:length(central_ccode))){
  centralP$country[centralP$ccode == central_ccode[i]] <- paste(central[i], "(Central)", sep = " ")
}
WWI <-rbind(alliedP, centralP)

ww1 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1914, xmax=1918, ymin=0, ymax=.4),alpha=0.05, fill ="salmon") +
  geom_line(data = WWI, aes(x = year, y = cinc, color = country, group = country)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWI Major Players ")+
   theme(plot.title = element_text(hjust = .5),legend.position="right")


allies <- c("United States of America", "United Kingdom", "France", "Russia", "Australia","China")
allies_ccode <- member_alliances$ccode[match(allies, member_alliances$state_name)]

axis <- c("Germany", "Italy", "Japan", "Hungary", "Romania", "Bulgaria")
axis_ccode <- member_alliances$ccode[match(axis, member_alliances$state_name)]

WWII_range = c(1934:1950)

WWII<- filter(NMC, NMC$year %in% WWII_range)
alliedP2<- filter(WWII, WWII$ccode %in% allies_ccode)
alliedP2$side = "Allies"
for( i in c(1:length(allies_ccode))){
  alliedP2$country[alliedP2$ccode == allies_ccode[i]] <- paste(allies[i], "(Allies)", sep = " ")
}

axisP<- filter(WWII, WWII$ccode %in% axis_ccode)
axisP$side = "Axis"
for( i in c(1:length(axis_ccode))){
  axisP$country[axisP$ccode == axis_ccode[i]] <- paste(axis[i], "(Axis)", sep = " ")
}

WWII <-rbind(alliedP2, axisP)

ww2 <- ggplot() + 
  labs(color ='Country')+
  xlab("Year") +
  ylab("CNIC")+
  geom_rect(data=d, mapping=aes(xmin=1939, xmax=1945, ymin=0, ymax=.4),alpha=0.05, fill ="paleturquoise3") +
  geom_line(data = WWII, aes(x = year, y = cinc, color = country, group = stateabb)) + 
  facet_wrap(~side) + 
  theme_classic()+
  scale_color_brewer(palette="Paired")+
  ggtitle("CNIC Score: WWII Major Players ")+
  theme(plot.title = element_text(hjust = .5),legend.position="right")

```


```{r fig.width=10, fig.height=8}
grid.arrange(ww1, ww2, nrow=2)

```
As mentioned, with WWI the US’s CINC spiked right after the war and then began to steadily decrease till the beginning of the WWII. Russia’s CINC dropped but rose again quickly and stayed on a relatively upward trend till WWII. Unlike the US and Russia, United Kingdom’s CINC was steadily decreasing after the war. Italy and Japan’s CINC remained steady. With the Central Powers after WWI, Germany’s CINC dropped but it did not rise again. Turkey’s, Romania’s and Bulgaria’s CINC remained steady. We see the Austria-Hungarian CINC disappear after the war since the Austro-Hungarian empire was dissolved at the end of the war (Royde-Smith).

With WWII, we see a similar pattern for the US where the CNIC reaches a peak at the end of WWII and steadily decreases till the Korean war. Russia also shows a similar pattern to WWI where its CINC score reaches a low point towards the end of WWII and then continues to steadily increase till the Korean war. The UK also follows a similar pattern where it CINC peaks right after the war and then steadily decreases throughout the Cold War time period. With the Axis powers, Germany’s and Japan’s CINC drops off.


To explore the patterns above, we considered the components that make up the CINC. We chose to focus on the major powers because they had the most drastic changes during this period. We looked at both the actual value and the ratio because the absolute values gradually increased over time but the ratios show performance relative to the other countries each year. Looking at the ratios helped us to see trends that were not easy to spot when looking at the overall values. Below is a plot of five of the six CINC components, the values and the ratios


```{r fig.width=25, fig.height=20 , echo=FALSE}

mex <- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Expenditures") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$milex), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = milex, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


mip<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Personnel ") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$milper), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = milper, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


nrg<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Primary Energy Consumption") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$pec), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = pec, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


ias<-ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Iron and Steel Production") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$irst), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = irst, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


urp<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Urban Population") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$upop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp, aes(x = year, y = upop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")

top<-ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Total Population") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp$tpop), fill=Conflict),alpha=0.15) +
  geom_line(data = NMC_mp, aes(x = year, y = tpop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


#grid.arrange(ias, mex, mip, nrg, urp, top, nrow = 3)


NMC_mp_ratios <- filter(NMC_ratios, NMC_ratios$year %in% all_year)
NMC_mp_ratios <- filter(NMC_ratios, NMC_ratios$ccode %in% country_code)
NMC_mp_ratios$country = ""
for( i in c(1:length(country_code))){
  NMC_mp_ratios$country[NMC_mp_ratios$ccode == country_code[i]] <- countries[i]
}


mex_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Expenditures Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$milex), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = milex, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")

mip_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Military Personnel Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$milper), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = milper, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


nrg_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Primary Energy Consumption Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$pec), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = pec, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


ias_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Iron and Steel Production Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$irst), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = irst, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")


urp_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Urban Population Ratio") +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$upop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = upop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="none")



top_r<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Total Population Ratio") +
  labs(color ='Country')+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(NMC_mp_ratios$tpop), fill=Conflict),alpha=0.1) +
  geom_line(data = NMC_mp_ratios, aes(x = year, y = tpop, color = country, group = country)) +
  #ggtitle("CNIC by Year for Major Powers Today") + 
  theme_classic() +
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3"))+
  scale_color_brewer(palette="Paired")+
  theme(plot.title = element_text(hjust = .5),legend.position="bottom",legend.key.size = unit(1, "cm"), legend.title=element_text(size=10) , legend.text=element_text(size=10))
  


g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

mylegend<-g_legend(top_r)


#grid_arrange_shared_legend(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow = 6)


```


```{r fig.width=16, fig.height=14}

grid.arrange(arrangeGrob(ias, ias_r, mex, mex_r, mip, mip_r + theme(plot.title = element_text(hjust = .5), legend.position="none"),nrow=3, ncol =2),mylegend, heights=c(10,1))

#grid.arrange(ias, ias_r, mex, mex_r, mip, mip_r, urp, urp_r, top, top_r, nrow=6)
```


With the Iron and Steel Production ratio, you can see that it follows to same pattern as the CINC for the US during this period. The ratio peaks towards the end of WWI and WWII and decreases in the period between the two wars. Till about the beginning of the Vietnam War, roughly 1955, the US dominated the world in Iron and Steel production and so this value had huge impact the US’s CINC score. Generally, during most wars, the US has the most military production but lost its position toward s the end of the Vietnam war when Russia surpassed the US. The United Kingdom maintained its iron and steel production, but since USA and Russia were increasing their production, the UKs ratio has been steadily decreasing since the 1900s, very similar to that of the pattern observed with the CINC score. 

Looking at military expenditure we see that the US and Russia had been significantly investing more in the military throughout the cold war. On the other hand, both countries decrease their military personnel after they peaked at the end of WWII. These findings are consistent with the Cold War where the US and Russia were in an arms race where they heavily invested in military technology but did not engage in any large-scale battles. The US’s military expenditures ratio peaks the same year its CINC score and iron and steel production ratio.

In the few years before WWII, you see Germany’s military expenditures ratio increase quite rapidly and the military personnel ratio saw a drastic increase in the one year before increasing quite rapidly. Although not as drastic, Japan and China follow a similar pattern where the military investment increased significantly a few years prior to WWII and the Korean War, respectively, and the military personnel ratio drastically increased right before the wars. This suggests that in years before the wars, these countries started investing in and preparing their militaries for war. The countries that were on the reactive side, the US, Russia and the UK, their military production ratios and military expenditures ratios only increased during the war. 


```{r fig.width=10, fig.height=3}
NMC_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
NMC_ratios_v<- filter(NMC_ratios, NMC_ratios$year %in% c(1970:2000))
#names <- c("North Korea", "South Korea", "Afghanistan"  , "Vietnam", "Republic of Vietnam")
#code<- c(731, 732,700, 816, 817)
names <- c("Iraq")
code <- c(645)
NMC_v<- filter(NMC_v, NMC_v$ccode %in% code)
NMC_ratios_v<- filter(NMC_ratios_v, NMC_ratios_v$ccode %in% code)
for( i in c(1:length(code))){
  NMC_ratios_v$country[NMC_ratios_v$ccode == code[i] ]<- names[i]
  NMC_v$country[NMC_v$ccode == code[i] ]<- names[i]
}


a<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") + 
  geom_line(data = NMC_ratios_v, aes(x = year, y = milex), colour = "indianred3") +
  ggtitle("Iraq Military Expenditures Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.025),alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")

b<- ggplot() + 
  scale_x_continuous(name="Year") + 
  scale_y_continuous(name="Ratio") +
  geom_line(data = NMC_ratios_v, aes(x = year, y = milper), colour= "royalblue3") +
  ggtitle("Iraq Military Personnel Ratio") + 
  geom_rect(data=d, mapping=aes(xmin=1990, xmax=1991, ymin=0, ymax=.05), alpha=0.03) +
  theme_classic() +
  theme(plot.title = element_text(hjust = .5, size =10 ),   axis.text.x =element_text(size =6), axis.text.y =element_text(size =6), axis.title.x =element_text(size =8), axis.title.y =element_text(size =8), legend.position="none")

grid.arrange(a,b,nrow = 1)
```

We wanted to see if this this same pattern was present in other conflicts. During the Gulf War, Iraq invaded Kuwait and it was met with international condemnation and the US and other nations joined forces to stop Iraq (Persian Gulf War). But before the invasion in war we see Iraq’s military expenditures and personnel ratios increasing. The gray shaded box indicates the period of the Gulf War (1990-1991). In the 1980s, Iraq’s military expenditure ratio drastically increased and then there was a sudden spike in military personnel right before the start of the war.  

```{r fig.width=14, fig.height=6}
ct <- c("United States of America", "United Kingdom", "France", "Russia","Germany", "Italy", "Japan", "China")
ct_ccode <- member_alliances$ccode[match(ct, member_alliances$state_name)]

NMC_ct <- filter(NMC, NMC$year %in% all_year)
NMC_ct <- filter(NMC_ct, NMC_ct$ccode %in% ct_ccode)
for( i in c(1:length(ct_ccode))){
  NMC_ct$country[NMC_ct$ccode == ct_ccode[i]] <- ct[i]
}

a<- ggplot(NMC_ct, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Major Powers") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5))+ 
  coord_flip() +
  labs(color ='CINC')


topCas <- c("Netherlands", "Yugoslavia", "Lithuania", "Poland", "Austria", "Hungary", "Romania", "Estonia", "Luxembourg")
topCas_ccode <- member_alliances$ccode[match(topCas, member_alliances$state_name)]


NMC_cas <- filter(NMC_orig, NMC$year %in% all_year)
NMC_cas <- filter(NMC_cas, NMC_cas$ccode %in% topCas_ccode)
for( i in c(1:length(topCas_ccode))){
  NMC_cas$country[NMC_cas$ccode == topCas_ccode[i]] <- topCas[i]
}

b<- ggplot(NMC_cas, aes(country, year, fill = cinc)) + geom_tile()+
  xlab("Country") +
  ylab("Year") +
  scale_fill_viridis() + 
  ggtitle("CINC Heatmap for Countries with the most Holocaust Casualties") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip() +
  labs(color ='CINC')
  

grid.arrange(a,b, nrow = 1 )
```

Finally, using the heat maps, we looked at impact on CINC after a war. On the left is a heat map of the major powers. You can see that France, Germany and Japan had missing CINC values at the end of World War II. Those periods of missing values correspond to the recovery period for each of those countries following the war. Next, we looked to see what other countries had gaps in their CINC data. Most of the countries that have missing gaps are Eastern European countries that where heavily impacted by the Holocaust and by the struggle to end Communism in the region (Royde-Smith). This indicates that for countries that are going through intense destruction or reformation, they do not have any CINC information.


**Conclusions & Next Steps **

To get even more into detail and context around certain characteristics, we want to look at specific events like pearl harbor or China’s invasion of Malaysia and other pivotal moments in wars to see how those events impact the CINC and its various components.  

A problem with NMC is that there are many other factors that determine a power of a nation rather than the 6 NMC factors. One of the major considerations that is not considered is the diplomatic relation between countries. Diplomatic relations play a major role is the prevention and conclusion of conflicts. With this data, it was not possible to factor that in. 

Additionally, another thing to consider is differences in policies between different countries. We see that military expenditures have been increasing for the US since the Cold War but Russia’s military expenditures take a sudden drop at the end of the Cold War. Since the end of the Cold War, Russia has been cutting military spending till today (Royde-Smith). Even with its participation in the Afghanistan War, Russia’s military expenditures have not increased. On the contrary, in the US today, politicians are proposing a Federal Budgets with increases in military spending. This difference is due to differences in policies of the countries. Thus, the reactions of countries to events will drastically vary based on their policies and it because hard to distinguish an overall pattern. 

Another drawback of NMC is that it cannot consider changes in universal priorities. For example, with an increased concern for climate change and scare natural resources and with advancements in technology, iron and steel production might start to decrease drastically in the future so it may no longer be a valid measure of power.  Similarly, advancements in technology would decrease the need for military personnel. The issue with NMC is that it cannot take such policy concerns and changes into consideration to measure national power. 




####Alliances


```{r fig.width=14, fig.height=8}

dayd_al_year <- filter(dyad_al_year, dyad_al_year$year %in% c(1900:2012))
dyad_al_year$length = dyad_al_year$dyad_end_year - dyad_al_year$dyad_st_year

dayd_al_year$conflict <- "0"
dyad_al_year$count <- 1 

dir_alliances <- gather(dir_al_year, treaty_type, idicator, defense:entente)
dir_alliances <- dir_alliances[!dir_alliances$idicator %in%  0,]
dir_alliances$dyad_end_year[dir_alliances$dyad_end_year %in% NA] = 2016
dir_alliances <- dir_alliances[dir_alliances$year>1900,]
alliance_count <- dir_alliances[, c(2,3,14,16)]
alliance_count$count <- 1 

gp_ct <- aggregate(cbind(count) ~ ccode1+state_name1+year+treaty_type, data =alliance_count, FUN = sum )
gp_ct$Conflict <- "0"

for(i in c(1:length(d$x1))){
  gp_ct[gp_ct$year >= d$x1[i] & gp_ct$year <= d$x2[i], length(gp_ct)] <- as.character(i)
}

```


WWI was triggered by the assassination of the Archduke Franz Ferdinand of Austria. His death set off diplomatic crisis as countries that were not involved in the original conflict were forced to get involved. Once Austria declared war on Serbia for the death of the Arch Duke, Russia had to step into defend Serbia. Once Russia entered the conflict, Germany was forced to enter the conflict due to its alliance with Austria. During the conflict Germany invaded Belgium; in response, the United Kingdom mobilized due to their alliance with Belgium. This pattern continued to eventually involve all the major powers of the world for a devastating battle. Such alliances were the cause of World War I. Since then the number of Alliances has only grown and continues to grow. For this reason, we wanted to look at alliances and see how they change during wars. 

Below is boxplot of the total number of alliances that are in effect each year between any two countries. It is easy to see that the median number of alliances jumped up significantly during WWII and continued to grow during the Cold War and remained relatively level since then. An interesting pattern is that the median number of alliances increased more in the 1-3 before the end of war. You can see this pattern with WWI, Korean War, Vietnam War and the end of the Cold War. Although the Cold War was only a state of severe political war there were many regional battles and the threat of a large-scale military war was constant. The number of alliances significantly increased from the start of the Cold War till the end.  


```{r fig.width=20, fig.height=8}
ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_boxplot(data = gp_ct, aes(x = as.factor(year), y = count, fill = Conflict)) +
  ggtitle("Total Alliances by Year") + 
  theme_classic()+
  theme(plot.title = element_text(hjust = .5, size = 20), axis.text.x = element_text(angle = 90, size = 10),  axis.text.y = element_text(size = 15),  axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15),  legend.position="bottom", legend.text = element_text(size=15),  legend.title = element_text(size=15))+
  scale_fill_manual(values=c("white", "lightsteelblue3", "pink3", "paleturquoise3", "lightsteelblue2", "lightsteelblue4","salmon" ), labels = mylables)

```
Next we looked at the types of alliances formed during this time. The COW data reports on 4 types of alliances: defense, neutrality, entente and non-aggression. In a defense alliance, the member states agree to defend one or more states in alliance in the event of a conflict. With a neutrality alliance, there is an agreement to maintain neutrality towards the members of the alliance. In non-aggression alliance, the members agree to take no military action against one another. Finally, with an entente alliance there is an understanding that the states would consult with one another if a crisis occurred (Formal Alliances). 

The plots below show the number of alliances by alliance type. The top row shows the number of new alliances that were formed each year and the second row shows number of alliances that were terminated that year. Please note that if an alliance was formed between 4 states then there would 6 new alliances in the data set because there is an alliance between each of the 4 members. Similarly, if an alliance between 4 states were terminated that would be 6 less alliances.



```{r fig.width=12, fig.height=8}
dir_al_0 <- filter(dir_al, dir_al$dyad_st_year %in% c(1900:2012))
all_st <- gather(dir_al_0, treaty_type, idicator, defense:entente)
all_st <- all_st[!all_st$idicator %in%  0,]
all_st$dyad_end_year[all_st$dyad_end_year %in% NA] = 2016
al_st_count <- all_st[, c(3,5,8,11,15)]
al_st_count$count <- 1 

gp_st <- aggregate(cbind(count) ~ dyad_st_year+treaty_type, data =al_st_count, FUN = sum )

a <- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count, aes(x = dyad_st_year)) +
  ggtitle("Total Alliances by Year they started")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  ggtitle("Number of Alliances Formed") +
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1 )

b<- ggplot() +
  xlab("Year") +
  ylab("Count")+
  geom_bar(data = al_st_count[al_st_count$dyad_end_year < 2016, ], aes(x = dyad_end_year)) +
  ggtitle("Number of Alliances Terminated")+
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=0, ymax=max(450), fill=Conflict),alpha=0.2)+
  theme_classic()+
  theme(legend.position="bottom", plot.title = element_text(hjust = .5)) + 
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)

grid.arrange(a,b, nrow =2)


```

Most of the alliances formed were at the end of WWII and during the Cold War. Also, the most frequently formed alliances were defense and entente. Surprisingly, in years that see a large increase in the number of alliances formed there is also an increase in the number of alliances that were terminated. To dig in further to get a better understand of what types of treaties were formed and why they ended, we looked at individual countries. 

The following charts are all organized the same way, they show a timeline of when the alliances started till either the end of the alliance (shown in red) or till 2012 if the alliance was observed in effect as of December 31, 2012 (shown in blue). The charts are facetted to show the different types of alliances because many of the alliance types overlap. For example, one alliance could be both a defense and entente alliance, so to get a better visual representation we separated the types of alliances. We focused this part of the analysis on the United States because it is a major power and thus is involved in many of the military alliances throughout history. 



**United States of America ** 

```{r fig.width=12, fig.height=9}
al_us_yr <- filter(dir_al_year, dir_al_year$state_name1 %in% "United States of America")
al_us_yr <- gather(al_us_yr, Treaty, idicator, defense:entente)   
al_us_yr <- al_us_yr[!al_us_yr$idicator %in%  0,]
al_us_yr$dyad_end_year[al_us_yr$dyad_end_year %in% NA] = 2016
al_us_yr <- al_us_yr[, c(1,5,8,11,14,16)]
al_us_yr$count = 1 
al_us_yr$Status <- ""
al_us_yr$Status[al_us_yr$dyad_end_year < 2012] <- "Ended"
al_us_yr$Status[al_us_yr$dyad_end_year== 2012] <- "Ongoing"

ggplot() + 
  xlab("Year") +
  ylab("Country")+
  geom_point(data =al_us_yr, aes(x=year, y = state_name2, color =Status), alpha = .5) + 
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin="Afghanistan", ymax="Zimbabwe", fill=Conflict),alpha=0.15) + 
  ggtitle("US Alliances ")+
  theme_classic()+
  scale_fill_manual(values=c("salmon", "paleturquoise3", "lightsteelblue2", "lightsteelblue4", "lightsteelblue3", "pink3")) + facet_wrap(~treaty_type, nrow = 1)+
  facet_wrap(~Treaty, nrow = 1 ) +
  theme(legend.position="right", plot.title = element_text(hjust = .5)) 

```

Looking at the alliances for the US, we see that most defense alliances are still in effect today. There were handful of alliances with South American countries that ended towards the end of WWII but the US entered a different alliance with those same countries immediately. The treaties in effect with the South American countries is the Inter-American Treaty of Reciprocal Assistance (Rio Pact) where if there is an attack against one country, it is considered an attack among all the Americas countries in the alliance. This alliance was created n 1949 and continues till today (The Rio Pact at a Glance). 

You can also see a similar pattern of ongoing alliances for defense entente and nonaggression treaty types. NATO, a defensive, entente and nonaggression alliance, was formed in 1947 and is still in effect till today. NATO involves 28 countries and accounts for the high number of alliances formed in 1949 for the 3 types (Formation of NATO). 

The entente alliances follow a similar pattern where the alliance ended and was immediately reformed. There were a few countries where there was an entente alliance formed towards the end of the Korean War and ended a few years after the end of the Vietnam war. The majority of the countries that follow the described patters are in Asia or Australia. This is reasonable considering they were participants is the Vietnam War. Also, the defense and entente alliance between the US and Cuba ended during the Vietnam War, indicated in the graph above, when Cuba was providing military support to the Vietnamese. Also, during the Vietnam war, there was a neutrality alliance for a few years between the US and countries that participated in the Vietnam war. This alliance was called the International Agreement on the Neutrality of Laos starting in 1961 and was terminated when was Democratic Republic of Vietnam violated the terms of the treaty 2 years later (Vietnam War History).  


**Conclusions & Next Steps **

Since we are mainly focusing on large-scale wars that involved various countries, many treaties were created and broken. For example, Warsaw Pact was created as a counter weight to the NATO Pact created at end of WWII. The US, Great Britain and their allies became part of NATO and the Soviet Union and its Allies became part of the Warsaw pact. Once the USSR dissolved many of the satellite nations, the Warsaw Pact members joined NATO. With NATO and the formation of the United Nations, it is hard to say which countries will participate in the next war. For example, before the beginning of the War on Afghanistan, the security council had to authorize the United States and NATO allies to organize an offensive against al-Qaeda (Witte). This type of regulation makes it hard to determine how future wars will play out. One thing that was interesting is that once a treaty falls apart, the members try to join another treaty which is why we see spikes is the median number of alliances towards the end of the wars. 

One of the things that was hard to work with this data set is that it was impossible to tell which alliances were part of a larger treaty. For example, if there was a data point for an alliance between the US and the UK in 1967, there was no indication of if it was NATO or some other treaty.  This also made it hard to tell when a country joined an existing alliance. For example, when Germany joined NATO there were data points for an alliance between Germany and the NATO members but it is not easy to discern that Germany join NATO without some internet research. 

The other downfall of this data set is that it only considers formal military alliances. It does not consider other types of alliances such as the United Nations & security council or a trade agreement. For example, Japan is not in any military alliance currently but it does have very close ties to the United States today, and this information is not captured in the data set. 

#### Militarized Interstate Disputes
  
The data is rich and contains many dimensions such as the outcomes, settlements, the number of fatalities, minimum duration, the highest action taken and hostility level during each militarized conflict in the last century. Hence as the first step, we selected a few variables and plotted them in a pcp to try and spot correlations.
  
```{r fig.width=15, fig.height=10}
library(ggplot2)
library(dplyr)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(GGally)

#path = "/home/vaguiar/col_hw/vis_hw/final/data/"
MIDA = read.csv(file="./data/MID/MIDA_4.01.csv", sep= ",")

war_year <- function(x){
  if(x >= 1914 & x <= 1918)
   return('WWI')
  if(x >= 1939 & x <= 1945)
   return('WWII')
  if(x >= 1950 & x <= 1953)
   return('Korean War')
  if(x >= 1955 & x <= 1975)
   return('Vietnam War')
  if(x >= 1947 & x <= 1991)
   return('Cold War')
  if(x >= 2001 & x <= 2010)
   return('War in Afghanistan')
  else
    return('No War')
}

MIDA$wartime <- sapply(MIDA$StYear, war_year)

##PCP PLot 
alphabending = 0.5
war <- ggparcoord(MIDA[MIDA$StYear>1900 & MIDA$wartime!='No War',], columns = c(9:11, 14:16), 
                  scale = "uniminmax", 
                  alphaLines = alphabending,
                  groupColumn = "wartime", 
                  title="Correlations In War Time Conflicts") +
                  theme_classic() +
                  theme(legend.position = "bottom")
                  #guides(fill=guide_legend(title="War Period", 
                   #      title.position = "bottom",
                    #     nrow=1)) 
peace <- ggparcoord(MIDA[MIDA$StYear>1900 & MIDA$wartime=='No War',], columns = c(9:11, 14:16), 
                  scale = "uniminmax",
                  alphaLines = alphabending,
                  groupColumn = "wartime", 
                  title="Correlations In Peace Time Conflicts") +
                  theme_classic() +
                  theme(legend.position = "bottom")
                  #guides(fill=guide_legend(title="War Period", 
                   #      title.position = "bottom",
                    #     nrow=1)) 
grid.arrange(war, peace, nrow = 2)  
```
In the graph plotting War Time Correlations, we see that most of the data points for Fatality, Settlement and Outcome gravitate towards the top half. There's also a high level of correlation between those three. Minimum Duration was partly skewed and the rest were evenly distributed. As for the Peace Time Correlations, the first three variables seem pretty evenly distributed too.

Below, we dig a little deeper into variables like Fatalities, Settlements and Outcomes of the conflicts. Each graph is overlaid with the time period of the major wars. Through this exploration, we try to spot possible predictors of a war.  
  
**Fatalaties In Disputes Leading to War**  

```{r fig.width=15, fig.height=10}

# Overlay w/ rectangle theme
y2_high =9
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))

#Adding Labels for Facet titles
facet_names <- as_labeller(c(
  '0' = "None",
  '1' = "1-25 deaths",
  '2' = "26-100 deaths",
  '3' = "101-250 deaths",
  '4' = "251-500 deaths",
  '5' = "501-999 deaths",
  '6' = "More than 999 deaths",
  '-9' = "Missing Data"
))

ggplot() + 
  geom_bar(data=MIDA[MIDA$EndYear>1900 & MIDA$Fatality!=0 & MIDA$Fatality!=-9,], aes(x = EndYear),   stat="count") +
  facet_wrap(~Fatality, nrow= 8, labeller=facet_names) +
  scale_fill_manual(values = myPal) +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
  ggtitle("Number Of Disputes With Fatalities By Year") +
  xlab("End Year") +
  ylab("Count")

```
In the graph above we have plotted the Fatalities occurring due to militarized conflicts every year since the year 1900. We've faceted the graphs by the number of fatalities starting from 1-25, 25-100, 101-250 and so on. Within each facet, the data is mildly interspersed with a larger concentration during the war years and sparse distribution at other times. But, interestingly so, there were conflicts resulting in more than 999 deaths in the years leading to the each and every major war. And in four of those years, namely 1913, 1938, 1955, and 2001, we see spikes in the number of deaths. This could be indicative of the fact that those conflicts forced an international involvement in the wars.

**Settlement of Disputes Leading to War**  
```{r fig.width=15, fig.height=6}

# Overlay w/ rectangle theme
y2_high = 9
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))


#Adding Labels for Facet titles
facet_names <- as_labeller(c(
  '1' = "Negotiated",
  '2' = "Imposed",
  '3' = "None",
  '4' = "Unclear",
  '-9' = "Missing Data"
))

ggplot() + 
  geom_bar(data=MIDA[MIDA$EndYear>1900 & MIDA$Settle!=3 & MIDA$Settle!='-9', ], aes(x = EndYear), stat="count") +
  facet_wrap(~Settle, nrow= 5, labeller=facet_names) +
  scale_fill_manual(values = myPal) +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
  ggtitle("Number Of Disputes With Settlement Outcomes By Year") +
  xlab("End Year") +
  ylab("Count")
```
In the graph above we have plotted the Settlement outcomes of different militarized conflicts every year since the year 1900. We've faceted the graphs by the Settlement types and for simplicity we've left out missing data or settlements that were unclear.  
  
Surprisingly, we don't see any correlations between Imposed/Negotiated Settlements and the war. But on second thought, the number of Settlements, would not be an important predictor of war. This could be due to the fact that some Settlements, such as such as the Treaty of Versailles [Treaty of Versailles] which was punitive and complex in nature, had a major role in shaping the political climate of Nazi Germany. But since the data doesn't delve into the qualitative aspects of these Settlements and it is of little or no significance to our exploration.  
  
**Outcomes of Disputes Leading to War**  

```{r fig.width=15, fig.height=10}

# Overlay w/ rectangle theme
y2_high =65
d = data.frame(x1=c(1914,1939, 1947, 1950, 1955, 2001), x2=c(1918, 1945, 1991, 1953, 1975, 2010), y1=c(0,0,0,0,0,0), y2=c(y2_high,y2_high,y2_high,y2_high,y2_high,y2_high), Conflict=c("WWI", "WWII", "Cold War", "Korean War", "Vietnam War", "Afghanistan War"), r=c(1,2,3,4,5,6))

MIDA_o <- filter(MIDA, MIDA$Outcome %in% c(1,2,3,4,5,6,8,-9))
war_outcome <- function(x){
  if(x ==2)
   return(1)
  if(x == 4)
   return(3)
  else
    return(x)
}

MIDA_o$WarOutcome <- sapply(MIDA_o$Outcome, war_outcome)


#Adding Labels for Facet titles
facet_names <- as_labeller(c(
  '1' = "Victory For Either Side",
  '3' = "Yield By Either Side",
  '5' = "Stalemate",
  '6' = "Compromise", 
  '8' = "Unclear",
  '-9' = "Missing Data"
))

ggplot() + 
  geom_bar(data=MIDA_o[MIDA_o$EndYear>1900 & MIDA_o$WarOutcome!='-9' , ], 
  aes(x = EndYear), stat="count") +
  facet_wrap(~WarOutcome, nrow= 5, labeller=facet_names) +
  scale_fill_manual(values = myPal) +
  geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Conflict), alpha=0.15) +
  ggtitle("Number Of Disputes With Outcomes By Year") +
  xlab("End Year") +
  ylab("Count")
```

In the graph above we have plotted the Outcomes of different militarized conflicts every year since the year 1900. We've faceted the graphs by the Outcomes and for simplicity, we've left out missing data or outcomes that were unclear. There is little evidence to show that certain types Outcomes were predictive of a war. But surprisingly so, we see that Stalemates peaked during the middle of an ongoing war. We see that in WWII, the Vietnam War and during War in Afghanistan. On further analysis, the years that correspond to the spike in stalemates in WW2 and the Vietnam War were the exact years in which the US entered both those wars; namely 1942 [The United States Declares War on Japan] and 1964 [Gulf of Tonkin Resolution]. The introduction of a major military power could explain the shift in the power.

**Conclusions & Next Steps **  
The face of modern warfare has changed dramatically in the past few decades. With technological advancements, militaries now resort to elite electronic warfare which enables them to target their enemies without any human presence on site. Cyber war is also widely employed to conduct espionage or to influence biases in the world. Countries maintain huge arsenals of strategic defense weapons such as Intercontinental Ballistic Missiles [ICBM] and Ballistic Missile Nuclear Submarines [SSBN] to serve as important deterrents to international conflicts. And learning from the past ravages of war, many countries stand united in collectively punishing rogue nations through trade embargoes or economic maneuvers. With such a shift in military strategy, the MID data may contain newer dimensions in the future. The involvement of certain actors may be harder to prove or pin point and the existing variables may not serve as valid predictors of an imminent war. 

But for the next steps of this exploration, we could work to consider the interplay of trade and economic reliance on the peace between borders. Through the MIDB data set we could drill down into the individual disputes between countries and their occurrences over time. Superimpose that over economic or trading ties during the same time periods, or the forming of other international alliances and we could have valuable findings.
  
  
## Sources 

"About the Correlates of War Project." Correlates of War. N.p., 05 Apr. 2014. Web. 19 Apr. 2017.  
  
"Battlefield: Vietnam." PBS. Public Broadcasting Service, n.d. Web. 19 Apr. 2017.  
  
"Cold War." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017.  
  
Formal Alliances (v4.1). Gibler, Douglas M. 2009. International military alliances, 1648-2008. CQ Press.     
   
"Formation of NATO." History.com. A&E Television Networks, 2010. Web. 19 Apr. 2017.  

"Gulf of Tonkin Resolution." Wikipedia. Wikimedia Foundation, 18 Apr. 2017. Web. 19 Apr. 2017.

"ICBM - Intercontinental Ballistic Missile." Wikipedia. Wikimedia Foundation, 19 Apr. 2017. Web. 20 Apr. 2017.

Jones, Daniel M., Stuart A. Bremer and J. David Singer. 1996 ."Militarized Interstate Disputes, 1816-1992: Rationale, Coding Rules, and Empirical Patterns." Conflict Management and Peace Science 15:163-213.

National Material Capabilities (v5.0). Singer, J. David, Stuart Bremer, and John Stuckey. (1972). "Capability Distribution, Uncertainty, and Major Power War, 1820-1965." in Bruce Russett (ed) Peace, War, and Numbers, Beverly Hills: Sage, 19-48.  

Palmer, Glenn, Vito D'Orazio, Michael Kenwick, and Matthew Lane.  2015.  "The MID4 Data Set: Procedures, Coding Rules, and Description."  Conflict Management and Peace Science.  Forthcoming.
     
"Persian Gulf War." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017. 
  
"The Rio Pact at a Glance." The New York Times. The New York Times, 20 Apr. 1982. Web. 19 Apr. 2017.  
   
Royde-Smith, John Graham. "World War I." Encyclopædia Britannica. Encyclopædia Britannica, Inc., 09 Dec. 2016. Web. 19 Apr. 2017.  
   
Royde-Smith, John Graham. "World War II." Encyclopædia Britannica. Encyclopædia Britannica, Inc., 03 Feb. 2017. Web. 19 Apr. 2017.

"SSBN - Ballistic Missile Submarine." Wikipedia. Wikimedia Foundation, 08 Apr. 2017. Web. 20 Apr. 2017.
    
"The United States Declares War on Japan." History.com. A&E Television Networks, n.d. Web. 19 Apr. 2017.

"Vietnam War History." History.com. A&E Television Networks, 2009. Web. 19 Apr. 2017.  
  
Witte, Griff. "Afghanistan War." Encyclopædia Britannica. Encyclopædia Britannica, Inc., 14 Oct. 2016. Web. 19 Apr. 2017.  

"Treaty of Versailles." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 19 Apr. 2017.









